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VERIFIED STATEMENT (DECLARATION) CLAIMING SMALL ENTITY STATUS 
(37 C.F.R. §§ 1.9(f) AND 1.27(c)) - SMALL BUSINESS CONCERN 

I hereby declare that I am 

[ ] the owner of the small business concern identified below: 

[X] an official of the small business concern empowered to act on behalf of the concern 
identified below: 

NAME OF CONCERN Excess Bandwidth Corporation 

ADDRESS OF CONCERN 10670 North Tantau Avenue 

Cupertino, CA 95014 

I hereby declare that the above-identified small business concern qualifies as a small business 
concern as defined in 13 C.F.R. § 1.21 for purposes of paying reduced fees under Sections 41(a) 
and 4Kb) of Title 35, United States Code, in that the number of employees of the concern, 
including those of its affiliates, does not exceed 500 persons. For purposes of this statement, (1) 
the number of employees of the business concern is the average, over the previous fiscal year of 
the concern, of the persons employed on a full-time, part-time, or temporary basis during each of 
the pay periods of the fiscal year, and (2) concerns are affiliates of each other when either, directly 
or indirectly, one concern controls or has the power to control the other, or a third party or parties 
controls or has the power to control both. 

I hereby declare that rights under contract or law have been conveyed to and remain with the small 
business concern identified above with regard to the invention entitled RELAXED. MORE OPTIMUM 
TRAINING FOR MODEMS AND THE LIKE , described in 

[X] the specification filed herewith 

[ ] Application No. , filed . 

[ ] Patent No. , issued . 

If the rights held by the above-identified small business concern are not exclusive, each individual, concern, or 
organization having rights to the invention is listed below,* and no rights to the invention are held by any person, 
other than the inventor, who would not qualify as an independent inventor under 37 C.F.R. § 1.9(c), or by any 
concern that would not qualify as either a small business concern under 37 C.F.R. § 1.9(d) or a nonprofit 
organization under 37 C.F.R. § 1.9(e). 

*NOTE: Separate verified statements are required from each named person, concern, or 
organization having rights to the invention averring to their status as small entities. (37 C.F.R. 
§ 1.27.) 
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RELAXED, MORE OPTIMUM TRAINING FOR 
MODEMS AND THE LIKE 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The present invention relates to filter adaptation, for example in 
communications such as wireline communications. 



10 2. State of the Art 

Broadband communications solutions, such as HDSL2/G.SHDSL 
(High-Speed Digital Subscriber Line) are increasingly in demand. The ability to 
achieve high data rates (e.g., L 5Mbps and above) between customer premises and 
the telephone system central office over existing (unconditioned) telephone lines 

15 requires exacting performance. Various components of a high-speed modem that 
contribute to this performance require training, e.g., a timing section (PLL, or 
phase lock loop), an adaptive equalizer, an adaptive echo canceller. Typically, these 
components are all trained in serial fashion, one after another, during an initial 
training sequence in which known data is transmitted between one end of the line 

20 and the other. 

Equalization is especially critical for HDSL2/G.SHDSL2 modems, which 
are required to operate over various line lengths and wire models and wirelines with 
and without bridge taps, with extremely divergent cross-talk scenarios. In general, 
intersymbol interference (ISI), which equalization aims to eliminate, is the limiting 

25 factor in XDSL communications. Hence, good equalization, characterized by the 
ability to accurately compute the optimal channel equalizer coefficients at the 
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start-up phase of the modem and adaptively update those coefficients to 
accommodate any change in the level of cross-talk, is essential to any 
HDSL2/G.SHDSL system. 

Known training methods for high-speed modems suffer from various 
5 disadvantages. Existing commercial products invariably use a Least Mean Squares 
(LMS) training algorithm, which is assumed to converge to an optimal training 
solution. The LMS algorithm is well-known and has generally been found to be 
stable and easy to implement. Conventional wisdom holds that the steady-state 
performance of LMS cannot be improved upon. Despite the widespread use of LMS 

10 and its attendant advantages, the adequacy of performance of LMS is being tested 
by the performance requirements of high-speed modems. 

Nor are the alternatives to LMS particularly appealing. Other proposed 
algorithms have chiefly been of academic interest. The Recursive Least Squares 
(RLS) algorithm, for example, requires a far shorter training time than LMS 

15 (potentially one tenth the training time needed for LMS), but RLS entails 

exceedingly greater computational complexity. If N is the total number of taps in an 
adaptive filter, then the complexity of RLS is roughly N 2 , as compared to 2N for 
LMS. Also, RLS is less familiar and less tractable, suffering from stability 
problems. 

20 An improved RLS algorithm ("fast RLS") considerably reduces the 

computational complexity of RLS, from N 2 to 28N. The original fast RLS algorithm 
is described in Falconer and Ljung, Application of Fast Kalman Estimation to 
Adaptive Equalization, IEEE Transaction on Communications, Vol. COM-26, No. 
10, October 1978, incorporated herein by reference. The fast RLS algorithm, 

25 however, requires that training be performed on contiguous data symbols. If 

training is performed "on-line," then a high-performance processor is required to 
perform training computations at a rate sufficient to keep pace with the data rate, 
e.g., 1.5Mbps or greater. Although the computational demand (demand for MIPs) 
"spikes up" during training, once training is completed, computational demands are 
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modest. If training is performed "off-line" using stored data samples, then the 
processor need not keep up with the data rate, reducing peak performance 
requirements. However, a potentially long sequence of training data must be stored 
to satisfy the requirement of the algorithm for contiguous data, requiring a sizable 
5 memory. Again, the memory requirement, like training itself, is transient. Once 
training has been completed, the need for such a large memory is removed. 

Apart from training, because communications channels vary over time, 
continuous or periodic filter adaptation is required. In the case of rapidly varying 
channel conditions, as in wireless communications and especially mobile wireless 

10 communications, and in the case of especially long filters relative to adaptation 

processing power, the use of RLS is indicated. In wireline communications, these 
conditions are typically not present. Even in the demanding case of 
HDSL2/G.SHDSL, filter lengths are moderate and channel variation can be 
considered to be slow. To applicant's knowledge, all wireline modems use LMS 

15 "on-line" for non-training filter adaptation. 

Although the error criteria used by the LMS and RLS algorithms differ, the 
prevalent mathematical analysis of these algorithms suggests that the algorithms 
converge to the same solution, albeit at different rates. LMS uses mean squared 
error, a statistical average, as the error criterion. RLS eliminates such statistical 

20 averaging. Instead, RLS uses a deterministic approach based on squared error (note 
the absence of the word mean) as the error criterion. In effect, instead of the 
statistical averaging of LMS, RLS substitutes temporal averaging, with the result 
that the filter depends on the number of samples used in the computation. Although 
the prevalent mathematical analysis predicts equivalent performance for the two 

25 algorithms, the mathematical analysis for LMS is approximate only. Although a 
mathematically exact analysis of LMS has recently been advanced, the 
overwhelming complexity of that analysis defies any meaningful insight into the 
behavior of the algorithm and requires numeric solution. 
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There remains a need, particularly in high-speed wireline communications, 
for a filter adaptation solution the overcomes the foregoing disadvantages, i.e., that 
achieves greater optimality without requiring undue computational resources. 

5 SUMMARY OF THE INVENTION 

The present invention, generally speaking, uses adaptation based on a least 
squares error criterion to improve performance of a wireline modem or the like. In 
accordance with one aspect of the invention, a high-speed, broadband, wireline 
modem includes an adaptive equalizer having both a training mode and a 

10 decision-directed, non-training mode, the adaptive equalizer including a memory for 
storing received signal samples; a forward path coupled to receive the signal 
samples, the forward path including a forward filter and a decision element; a 
feedback path coupled between an output of the decision element and an input of the 
decision element, the feedback path including a feedback filter; wherein the 

1 5 combined length of the forward filter and the feedback filter is moderate relative to 
adaptation processing power; and an adaptation circuit or processor for adapting the 
forward filter and the feedback filter is based on a least squares error criterion, as 
distinguished from a least mean squares error criterion. A lower noise floor is 
thereby achieved. The resulting improved noise margin may be used to "buy" 

20 greater line length, better quality of service (QoS), higher speed using denser 

symbol constellations, greater robustness in the presence of interference or noise, 
lower-power operation (improving interference conditions) or any combination of 
the foregoing. In accordance with another aspect of the invention, an adaptation 
algorithm based on the least squares error criterion is provided for use during 

25 training of a high-speed, broadband modem. The algorithm converges to a more 
optimal solution than LMS. Furthermore, the algorithm achieves a high level of 
robustness with decreased computational complexity as compared to known 
algorithms. The algorithm is well-suited for fixed-point implementation. 
Significantly, unlike known algorithms, the algorithm allows for reinitialization and 
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the use of non-contiguous data. This features allows for a wide spectrum of system 
initialization strategies to be followed, including strategies in which training of 
multiple subsystems is interleaved to achieve superior training of multiple 
subsystems and hence the overall system, strategies tailored to meet a specified 
5 computational budget, etc. 



BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention may be further understood from the following 

description in conjunction with the appended drawings. In the drawings: 

10 Figure 1 is a block diagram showing portions of a wireline communications 

transceiver with which the present invention may be used; 

Figure 2 is a block diagram of a decision feedback equalizer of Figure 1; 

15 Figures 3A-3D are graphs illustrating superior steady-state performance 

achieved using the RLC-fast algorithm; 

Figure 4 is a chart summarizing the original RLS-fast algorithm and the 
computational complexity of the algorithm; 

20 

Figure 5 is a chart summarizing the RLC-fast algorithm and the 
computational complexity of the algorithm; 

Figure 6 is a diagram illustrating conventional training using a single 
25 contiguous block of data; 

Figure 7 is a diagram illustrating the inability of conventional adaptation 
techniques to operate on discontiguous blocks of data; 

30 Figure 8 is a diagram illustrating the ability, afforded in accordance with one 

aspect of the present invention, to operate on discontiguous blocks of data; 

Figure 9 is a diagram illustrating further particulars of the reinitialization 
Figure 8; 

35 

Figure 10 is a flowchart illustrating cooperative, interdependent training of 
multiple subsystems in accordance with one aspect of the present invention. 
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DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 
Referring to Figure 1, a block diagram is shown illustrating portions of a 
wireline communications transceiver with which the present invention may be used. 
The wireline communications transceiver includes a control section, a transmit 
5 section, a receive section, and a hybrid section. Within the transceiver, particularly 
the receive section, may be various subsystems that require training by the control 
section, for example, a PLL (which may be of digital implementation), an echo 
canceller and an adaptive equalizer. The training of these subsystems is an 
interdependent process. For example, some initial training of the PLL may be 

10 required prior to any other training. This initial training, however, may not achieve 
as good results as may be obtained following some training of one or more of the 
other subsystems, i.e., the echo canceller and adaptive equalizer. As described 
hereinafter, the present training methods allow for coordinated, interdependent 
training of multiple sub-systems, offering the potential of substantially improving 

1 5 overall system performance. 

Since ISI, which equalization aims to eliminate, is typically the limiting 
factor in XDSL communications, the focus of the following description will be 
equalizer training. The same principles, however, may be applied to the training of 
various different communications subsystems. 

20 Referring to Figure 2, a block diagram is shown of an adaptive decision- 

adaptive equalizer (DFE) suitable for use in the wireline communications transceiver 
of Figure 1. An input signal from a communications line (e.g., an 
HDSL2/G.SHDSL line) is 2x oversampled. The communications line forms the 
channel for which equalization is to be performed. The samples are input to an 

25 adaptive feedforward filter. In conjunction with the filter, a decimation operation is 
performed. The resulting data stream is applied to a decision element, or "slicer," 
which produces an output of the equalizer. The output is applied to an adaptive 
feedback filter, an output of which is summed into the input to the slicer. The DFE 
structure per se is known. Data decisions are filtered by the feedback filter to 
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eliminate ISI arising from previous pulses. Because the feedback filter compensates 
for this "past" ISI, the feedforward filter need only compensate for "future" ISI. 
The equalizer of Figure 2 differs from conventional equalizers in that the filter 
adaptation is performed using a variant of RLS, tfeinitializable Low Complexity fast 
5 least squares (RLC-fast), described hereinafter. 

An important, even startling, discovery of the present inventors is that 
RLS-type algorithms, apart from converging faster, converge to a lower noise floor 
than the LMS algorithm. That is, better equalization can be performed using the 
RLS-type algorithms than with LMS. This result is illustrated in Figure 3. Only in 

10 the exacting environment of high-speed, wide-band wireline modems such as 
HDSL2/G.SHDSL does this important difference come to the fore. In fact, 
experiments have shown that in this environment, even if an adaptive filter is set to 
a near-optimal solution obtained using an RLS-type algorithm, if the LMS algorithm 
is then used, the filter settings will actually diverge from the near-optimal solution. 

15 A great incentive therefore exists to use an RLS-type algorithm instead of the 

prevalent LMS algorithm. Impediments to the use of RLS-type algorithms in this 
environment include computational complexity and instability. 

Although the computational complexity of the fast RLS algorithm is greatly 
reduced, it remains significant. The computational complexity of adaptation is 

20 measured in terms of the number of multiplications and/or divisions required per 
filter coefficient times the total number of filter coefficients N for the structure. 
Although the present invention may be used with equalizers of other structures and 
in other applications of adaptive filters, the invention will be described with respect 
to the exemplary embodiment of Figure 2. 

25 Whereas the original fast RLS algorithm requires 28N multiplications and 

matrix inversion, the computational complexity of the present "RLC-Fast" 
algorithm is 22N multiplications and involves 2 divisions. This improvement in 
computational efficiency is achieved by efficiently rewriting the original algorithm. 
Note that there are algorithms with computational complexity as low as 17N; 
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however, they are very susceptible to error accumulation, and are hard to stabilize 
without the use of additional correction terms. In the case of a fixed-point equalizer 
implementation, stability is crucial for overall system reliability. The computational 
complexity of RLC-fast is reduced without significantly degrading the stability of 
5 the algorithm. 

Referring to Figure 4, a chart summarizing the original fast RLS algorithm 
is shown. 

Referring to Figure 5, a corresponding chart summarizing the RLC-fast 
algorithm is shown. Figure 5 follows a different but similar notation than that of 
10 Figure 4, as set forth in the following table: 



Table 1 



20 
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Prior Art Algorithm 


New Algorithm 
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A fast 


Backward predictor coefficients 
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D fast 
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("rearranging" 
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rearrangement 
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Derivation of the RLC-fast algorithm from the original algorithm and the 
computational advantages of the RLC-fast algorithm are described in detail in 
Appendix B. 

A fixed-point implementation of the RLC-fast algorithm is desirable to 
5 reduce computational load and hence increase the speed of the algorithm, as well as 
to avoid the cost and increased power consumption of a floating-point processor. 
Because of the underlying stability issues of RLS-type algorithms, such a fixed-point 
implementation must be carefully considered. The binary point cannot be assumed 
to be at the beginning just after the sign bit~i.e. ; all numbers within [-1, l)-to 

10 avoid saturation of the variables, since, for some of the internal variables, the actual 
values may become larger than 1 . 

Key elements for successful implementation of the RLC-fast algorithm 
include: (1) Appropriate scaling of the input variables; (2) the position of the binary 
point for internal variables; (3) efficient internal scaling of the variables after 

15 multiplication and division to reduce loss of precision; (4) complete analysis of the 
dynamic range of various internal variable; and (5) judicious choice of delta (Sj) and 
lambda (A,) for convergence speed and stability. 

A currently preferred implementation assumes 32-bit precision for all the 
variables, with all the numbers being of signed integer form. The integer numbers 

20 are given a floating point interpretation in which the leading bit is the sign bit, 
followed by a 5-bit integer part and a 26-bit fractional part. Multiplication and 
division are performed assuming the foregoing interpretation of the integer 
variables. There occur two divisions per update. Both are computed as 1/(1 + x) 
instead of 1/x to reduce the loss in precision. 

25 A more detailed description of the RLC-fast algorithm is given in Appendix 

A (implemented in fixed point arithmetic for the DSP TI-C6x). 

Due to the high data rate of the HDSL2/G.SHDSL system, for moderate-size 
problems (N about 100), the RLC-fast algorithm, even with its reduced complexity, 
poses a high computational burden on a typical processor (say, an X MIPS 
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processor). In many modems, RLC-fast will be executed only once at the start-up 

phase of the modem and will not be used in the steady-state, which is the normal 

operating state for the modem. 

Hence, although it may be feasible to deploy a high-speed, power-hungry 
5 DSP for on-line execution of RLC-fast, such a measure adversely impacts power 

consumption and may not be cost effective. As a result, off-line implementation of 

RLC-fast will often be the preferred alternative. 

However, off-line implementation itself raises problems. The RLS-type 

algorithm requires a certain data length in order to converge to a near optimal value. 
10 The convergence time is a function of the so-called forgetting factor. An aggressive 

choice of the forgetting factor can be used to reduce the required data length but at 

the cost of stability. 

A reasonable choice for the forgetting factor may require a long data length 

(say, 100N) for convergence. This in turn implies a large storage requirement even 
15 for a moderate size problem. Once again, if this memory is only used during the 

start-up phase, a straight-forward implementation wastes large amounts of silicon 

and results in inefficient design. 

The original fast RLS algorithm offers no solution to the foregoing problem. 

Referring more particularly to Figure 6, the original fast RLS algorithm requires the 
20 input data stream to be contiguous. If there is a break in the input data stream, the 

only way to use the new data in the original approach is to restart the algorithm all 

over again as illustrated in Figure 7. Of course, the algorithm can be trained with 

smaller size blocks of data, but only at the cost of reduced performance. That is, the 

advantage of Figure 3 would be sacrificed. 
25 To circumvent the requirement of a contiguous data stream, RLC-fast uses a 

re-initialization scheme that allows the use of a non-contiguous data block without 

restarting the algorithm. At start-up, the algorithm is initialized in the usual way. 

However, the algorithm can be stopped at any time and started at a later time with a 

new initialization. This manner of operations is illustrated in Figure 8. No 
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difference in performance is observed if individual data blocks are not too small 
(say, no smaller than ION). Hence, storage requirements may be reduced by an 
order of magnitude (e.g., ION instead of 100N). 

The particulars of re-initialization are illustrated in Figure 9. Instead of 
5 setting the intermediate variables to zero or a scaled identity matrix, the previous 
values are used for all variables except X^. The variables A fast , F fast , K fast7 b n , D fast 
and C fast are all stored for this purpose. 

The foregoing re-initialization capability allows for a store/process mode of 
operation. More particularly, even with the reduced complexity of RLC-fast, the 
10 amount of computation required for real-time processing of moderate size problems 
can be prohibitive for most DSPs due to the high data rate of the system. To 
alleviate this problem, a store/process mode of operation is followed in which, 
during the first half of a cycle, a small block of data (e.g., size ION) is stored, and 
during the second half of the cycle, the data is processed to update the filter 
15 coefficients. Instead of operating in real-time, since the data is stored, each update 
need not be finished within the sample time T. Instead, the computation can be 
distributed over multiple sample periods. 

One approach is to partition the computation of the update for each data 
sample in small enough segments such that an individual segment can be finished in 
20 one sample time. The smaller the partition, the less processing is required each 
sample period. Total time to finish the update increases. Hence, store/process 
operation, along with partitioning of the update computation, provides a flexible 
mechanism that allows for trade-off between processing load and total time to 
process a data block. Without the capability of re-initialization, this flexibility is not 
25 obtainable. 

The same flexibility may be extended from the adaptive equalizer or other 
isolated sub-system to the system as a whole, in such as way as to achieve not only 
great flexibility but also improved performance. In reality, the performance of each 
sub-system is interdependent on the performance of other sub-systems and should 
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not be viewed in isolation. Referring to Figure 10, for example, the performance of 
the clock recovery circuit of the PLL block is influenced by the performance of the 
echo canceller and vice versa. The same is true for the performance of the echo 
canceller and the equalizer. The better the echo cancellation is, the better the 
equalizer performance will be. As a consequence, there will be fewer errors in the 
decision. These more accurate decisions can be used to improve the echo 
cancellation performance. This improved echo cancellation can be utilized to reduce 
the zitter performance of the PLL. 

It will be appreciated by those of ordinary skill in the art that the invention 
can be embodied in other specific forms without departing from the spirit or 
essential character thereof. The presently disclosed embodiments are therefore 
considered in all respects to be illustrative and not restrictive. The scope of the 
invention is indicated by the appended claims rather than the foregoing description, 
and all changes which come within the meaning and range of equivalents thereof are 
intended to be embraced therein. 
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APPENDIX A 
A Detailed Description of EBC-Fastalgo 

Step 1 Initialization: 



Afast 


= o, 


(JVx3) 


Dfast 


= o, 


(Nx2) 


Pfast 




(3 x 3) 


Kfast 


= o, 


(JVx 1) 


Xfast 


= o, 


(iVx 1) 


Cfast 


= o, 


(N x 1) 


decision 


= 0. 


(1 x 1), 



Step 2 Data Acquisition: 

PZ = [X fast {FFE LBN - 2) X fa3t (FFE LEN - 1) X fast (N - 1)], 
eps 2 = [y(2);y(l); decision]; 
decision = tx signal. 

Step 3 Update Computation: 

e = eps 2 4- ^/ a5 tXfast- 

e[0]=eps2[0] ; 

e[i]=eps2[i]; 

e[2]-eps2[2]; 

for (ind-0 ; ind<N ; ind++) 

{ 

e [0]+= mpy n(A fast [ind] ,X fast [ind] ) ; 
e [i]+= mpy n(A fast [ind+N] ,X fast [ind] ) ; 
e [2] mpy n(A fast [ind+N 2] ,X fast [ind] ) ; 

} 

Af as t = A f a$ t - Kf ast e T . 

f or ( ind=0 ; ind<N ; ind++) 

{ 

ind2=ind+N; 
ind3=ind+N 2 ; 
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APPENDIX A (continued) 

A fast [ind] mpy n(K fast [ind] ,e [0] ) ; 
A fast [ind2] mpy n(K fast [ind] ,e [1] ) ; 
A fast [ind3] -= mpy n(K fast [ind] , e [2] ) ; 

} 



e p = eps2 + Af ast Xf a3t . 

Equivalently, 

e p = e(l - Kj ast Xf a3t ). 

temp fast^inpCK fast,X f ast,N) ; 
temp fast=FIXED ONE -temp fast; 
ep [0] ^mpy n(temp fast , e [0] ) ; 
ep[l]=mpy n(temp fast,e[l]); 
ep [2] =mpy n(temp fast , e [2] ) ; 



Ffast = \Ffast~ 

mat seal ( lambda, F fast, MM, F fast) ; 



F fast = F fast - (F fast e p e T F fast ) (J + Jp^^ , 



*3 = Ffastep, 
We use the following equivalent computation: 



h ~ Ffasi i+e* ' F fast e p > 

Ffast — Ff as t — ^ T Ff ast . 



mat cvec(F f ast ,ep,M, temp vecM) ; 

temp fast=inp(e,temp vecM,M); 

temp fast=div wor(temp f ast+FIXED ONE) ; 

mat seal (temp fast, temp vecM,M,temp vecM) ; 

mat rvec(F f ast ,e,M, temp vecMl) ; 

outp(temp vecMjtemp vecMl,M,temp matM) ; 

for (ind=0 ; ind<MM ; ind++) 
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APPENDIX A (continued) 

F fast [ind] =F fast [ind] -temp matM[ind] ; 
} 

tempvec — Ffast e p- 

mat cvecCF f ast,ep,M,temp vecM) ; 

b n = K Jas t + Af as t*te 

f or (ind=0 ; ind<N ; ind++) 

{ 

bn [ind] =K fast [ind] ; 

bn[ind]+=iapy nCA fast [ind] .temp vecM[0]) ; 
bn[ind] + =*py n(A fast[ind + N] ,temp vecM[l]) ; 
bn[ind] + =nw n(A f ast[ind + N 2] ,temp vecM[2]) ; 

} 

m = [t»(l : 2); bnd : FFE LEN - 2); *»(S); b n {FFE LEN + 1 : N - Dl- 

m vec[0]=temp vecM[0] ; 
m vec[l]=temp vecMdl ; 
f or (ind=2 ; ind<FFE LEN ; ind++) 

{ 

m vec [ind] =bn[ind-2] ; 

m vecCFFE LEN]=temp vecM[2] ; 
for (ind=FFE LEN+i ; ind<N ; ind++) 

{ 

m vec[ind]=bn[ind-ll ; 

> 



fi = [b n {FFE L EN ~ 1 : FFE L BN);b n {N)]. 

mu[0]=bn[FFE LEN-2] ; 
mu[l]=bn[FFELEN-l]; 
mu[2]=bnCN-i] ; 
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APPENDIX A (continued) 



Xf ast = [eps 2 (l : 2); A>„ t (l : FFE LEN - 2);eps 2 (3); X fast {FFE LEN + 1 : N - 1)]. 

for(ind=FFE LEN-1 ; ind>l ; ind — ) 

{ 

X fast[ind}=X fast[ind-2] ; 
} 

X fast[i>eps2[l]; 
X fast [0]«eps2[0] ; 

f or (ind=FBE LEN-1 ; ind>0 ; ind~) 
{ 

X fast [ind+FFE LEM] =X fast [FFE LEN+ind-i] ; 
} 

X fast [FFE LEN]=eps2[2] ; 

7} = Pz 4- Dj ast X fast . 

eta[0]=p3[0] ; 
eta[l]=p3[l] ; 
eta[2]=p3[2] ; 

for Cind=0 ; ind<N ; ind++) 

{ 

eta[0]+=mpy n(D fast [ind] ,X fast [ind]) ; 
eta[l]+=mpy n(D fast [ind+N] ,X fast [ind] ) ; 
eta [2] +=mpy n(D fast [ind+lt 2] ,X fast [ind] ) ; 
} 



D fast = Df as t - m7) T (I + fJ,T] T /(l - 7} T 
Kfast = rn — D f ast fx. 

We use the equivalent computation: 

Kfast = (m - (Df ast ^))/(1 - <q t ij), 

Dfast = -D/aa* - Kf ast rj r . 
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APPENDIX A (continued) 

temp fast^npCeta^u^M) ; 

temp fast^div wor (FIXED ONE-temp fast) ; 

f or (ind=0 ; ind<N ; ind++) 

{ 

ind2=dnd+N; 

ind3=ind+N 2; 

K fast [ind]=m vec [ind] ; 

K fast [ind] -=mpy n(D fast [ind] ,mu[0] ) ; 

K fast [ind] -=mpy n(D fast [ind2] ,mu[l] ) ; 

K fast [ind] -=mpy n(D fast [ind3] »mu [2] ) ; 

K fast [ind] =mpy n(K fast [ind] ,temp fast) ; 

} 

for (ind=0 ; ind<N ; ind++) 
{ 

ind2=ind+N; 
ind3=ind+N 2; 

D fast [ind] -*=mpy n(K fast [ind] ,eta[0] ) ; 
D fast [ind2] -«mpy n(K fast [ind] ,eta[l] ) ; 
D fast [ind3] -=mpy n(K fast [ind] ,eta[2] ) ; 
} 

%est = Gj ast Xf a3t , 

xest=inp(X fast,C fast,N); 

error ~ decision — x est . 

error xest= decision-xest ; 

Cfast = Cf as t -f error * Kj ast . 
if ( ( (gdat .f astbuf f of f set)*2)>FFE LEN) 
{ 

f or (ind=0 ; ind<N ; ind++) 

{ C fast [ind] +=mpy n(K fast [ind] , error xest); 

} 

} 
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APPENDIX B 

Step-by-step derivation of the RLC-fast algorithm from fast algorithm 
Step 1 Initialization: 



Af as t 


= o, 


{Nx 3) 


Dfast 


= o, 


(iVx3) 


Ffast 


= Sil, 


(3 x 3) 


Kfast 


= o, 


(JV x 1) 


Xfast 


=. o, 


(JVxl) 




- o, 


(JV x 1) 


decision 


= 0. 


(lxl). 



The only difference is in the definition of Ff ast . Instead of defining the variable Ef asU we 
define its inverse, i.e., F fast = Ej* sV Hence, $ = 1/6. 

Step 2 Data Acquisition: 

Ps = [Xfast(FFE LBN ~2)X fast (FFE LE N~l)X fast (N~l)}, 
eps 2 = [y (2); y(l); decision]; 
decision = txsignal. 

This phase is identical to the data acquisition phase of fast algorithm. 
Step 3 Update Computation: 

The first two equations are identical: 

e = eps 2 + A T fast X fasU 
A fast = A f ast — Kj ast e T . 

Fast algorithm: 

e p = eps2 + Aj ast X fast . 
Using the last relation for Af ast , we write 

Aj ast Xf ast = (A f ast — Kf ast e T> j X f as t , 
~ AfastXfast — eKj ast Xf as t, 

= e - esp 2 - eKj ast X fasU 
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APPENDIX B (continued) 

where in the last equation we used the definition of e. Substituting this expression for 
^fast^fast back to the equation for e p , we get an equivalent express for e p as 

e p = e(l - Kf ast Xf ast ). 

Motivation: Reduction in computation from mN to N -f m. 
Fast algorithm: 

K = K fast + A fast Ej* st e p , 



First notice that, 



b n = 



Kfast + A fastE f a St e p , 



Now, 



'fast^P' 



. l + e T A^ epJ 

where in the last line we used the matrix inversion lemma and A* - 1/A. Substituting 
Ffast — EJast rearranging, we get 



- X i E fast 



1 -f e 1 XiFf ast e p 



After resequencing, we obtain 



b n = Kf ast -f Af^c^. 

Motivation: Reduction in computation from miV -f- 4m 2 multiplications and an (m x m} 
matrix inversion to mN + 4m 2 + 2m multiplication and a scaler division. This also improves 
the stability of the algorithm. 
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APPENDIX B (continued) 



Next, four equations are identical: 

m = [cn(l:2);b n (l:FFE LEN ~2y,c n (3);b n (FFE LEN ^l:N~l)], 
fl = [bniFFELEN-llFFELBNhbniN)], 

Xfast = [eps2(l:2);X fast (l:FFE LEN ~2) ] eps 2 {^X fa3t (FFE LEN ^l:N-l)], 

V = P3+&Jast X fast. 

Fast Algorithm: 

Kfast = rn — Df ast fi. 

Substituting the first relation for Df ast into the second equation and using matrix inversion 
lemma, we get 

Kfast = m — (D f ast - rnr} T ) (I - w r ) - 1 ^ 

= m - (D fast - mi?) ( I + m T }fr 

= m - (D fasi - + 1 f£^J , 

" m — 



Now, 



_ m - (DfastfJ>) 
1 - ?7 T p 



Dfast — {Dfast -m7) T )(I- t*7} T ) 

- M I+ ^)-*"' T ( I+ ^7' 

- n , D f*st V>'n T ( T , m7} T firf r \ 

- -^/asi + -~ ^ mr/ 1 -h 7— V — , 



DfastW 7 _ rnif 
1 - ?7 r ^ 1 - ' 



— Df as t + 1 _ T — 1 T , 



— Dfast ~f~ : t=! 1 , 

_ n {m-Df a3t iA) T 

- T^TT- 77 ' 

= -D/ast - KfastV T i 
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APPENDIX B (continued) 



where in the last step we substituted the express for Kf ast . Hence, 

Kfast = (™>-(D fast JJ,))/(l-7] T js), 
&fast = Bfast — KfastV T ■ 

Motivation: Reduction in computation from (m+2)miV*-hm 2 multiplications and an(mxm) 
matrix inversion to (2m + l)iV 4- m multiplications and a scaler division/ 

The last three equation are identical: 

x est ~ CfastXfasU 

error — decision — x es t^ 
Cfast = Cf as t + errorK fast . 
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What is claimed is: 

1. A high-speed, broadband, wireline modem including an adaptive 
equalizer having both a training mode and a decision-directed, non-training mode, 
the adaptive equalizer comprising: 

5 at least one of: a forward path coupled to receive the signal samples, 

the forward path including a forward filter and a decision element, and a 
feedback path coupled between an output of the decision element and an 
input of the decision element, the feedback path including a feedback filter; 
and 

10 means for adapting the one of said forward filter and said feedback 

filter based on a least squares error criterion, as distinguished from a least 
mean squares error criterion. 

2. The apparatus of Claim 1, further comprising a memory for storing 
15 received signal samples. 

3. The apparatus of Claim 1, comprising both said feedforward path and 
said feedback path. 

20 4. The method of Claim 1, wherein the means for adapting operates 

during decision-directed mode. 

5. The method of Claim 1, wherein the combined length of the forward 
filter and the feedback filter is moderate relative to adaptation processing power. 

25 

6. The method of Claim 1, wherein adaptation is performed using 
fixed-point arithmetic. 
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7. The method of Claim 1, wherein said means for adapting performs 
substantially the following computation: 

e p = e{l-K T f- X f J. 



8. The method of Claim 1, wherein said means for adapting performs 
substantially the following computations: 



^ T F fast e p 



F = F - r p T F 

1 fast 1 fast 1 fasP 



b n ~ K fast + ^fasfrC 



-23- 



PATENT 

ATTORNEY'S DOCKET NO. 032478-005 

9. The method of Claim 1, wherein said means for adapting performs 
substantially the following computations: 

K fast = (m - (D fastl i))/(1 - rfti, 

D fast ~ Dfast ~ K faspf- 

10. The method of Claim 1, wherein a routine for updating said one of 
said forward filter and said feedback filter performs no more than 22N multiplies, 
where N is the number of filter taps, and wherein no distinct stabilization quantity is 
computed. 

11. A method of performing adaptation of an adaptive filter using an 
LS-type adaptation algorithm, comprising: 

storing a first block of data; 
initializing the adaptation algorithm; 

processing the first block of data using the adaptation algorithm; 
storing a second block of data; and 

re-initializing the adaptation algorithm using results from processing 
the first block of data. 

12. The method of Claim 11, wherein the adaptive filter is used in a 
communication system in which symbols are communicated, one symbol per symbol 
period, further comprising: 

starting and finishing processing of a block of data within a portion of 
a single symbol period, a remaining portion of the symbol period being used 
for other computations. 
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13. The method of Claim 12, further comprising selectably fixing said 
remaining portion by fixing a size for the data blocks. 

14. A method of modem training in which the modem includes multiple 
5 subsystems provided with respective adaptive filters, comprising: 

partially training the adaptive filter of a first subsystem by applying a 
first training algorithm to first data to obtain first filter settings; 

at least partially training the adaptive filter of a second subsystem; 

and 

10 completing training of the adaptive filter of the first subsystem by 

using for initialization at least some of said first filter settings and applying 
the first training algorithm to second data not contiguous with said first data 
in a received data stream. 
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ABSTRACT 

A high-speed, broadband wireline modem using the least squares error 
criterion rather than a least mean squares criterion is described. 

5 
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